Attention Is All You Need

https://arxiv.org/abs/1706.03762

Attentionを独自拡張し、モデルアーキテクチャTransformerを提案

Transformerが隆盛している（例：（積ん読）A Survey of Transformers (2021)）

Self-attention

https://github.com/tensorflow/tensor2tensor#walkthrough

tensor2tensor

翻訳はsequence transduction taskの1つ

Abstract

We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.

「新しい単純なネットワーク構造、Transformerを提案する」

「Transformerは単にattention機構だけに基づき、recurrent networkやconvolution networkを完全になしで済ませる」

機械翻訳タスク

性能で上回った

parallelizable（ref: 8.5.2 Transformer）

7.Conclusion

In this work, we presented the Transformer, the first sequence transduction model based entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention.

For translation tasks, the Transformer can be trained significantly faster than architectures based on recurrent or convolutional layers.

「翻訳タスクにおいて、リカレントレイヤーや畳み込み層に基づくアーキテクチャよりもTransformerは有意に早く訓練できる」

3 Model Architecture

4 Why Self-Attention

5 Training (Attention Is All You Need)

TODO Attention Visualization (Appendix) Figure 3